Experimental Fast-Tracking of Morphological Analysers for Nguni Languages
نویسندگان
چکیده
The development of natural language processing (NLP) components is resource-intensive and therefore justifies exploring ways of reducing development time and effort when building NLP components. This paper addresses the experimental fast-tracking of the development of finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by using an existing prototype of a morphological analyser for Zulu. The research question is whether fast-tracking is feasible across the language boundaries between these closely related varieties. The objective is a thorough assessment of the recognition rates yielded by the Zulu morphological analyser for the three related languages. The strategy is to use fast-tracking techniques that consist of several cycles of the following steps: applying the analyser to corpus data from all languages, identifying (types of) failures, and implementing the respective changes in the analyser. The tests show that the high degree of shared typological properties and formal similarities among the Nguni varieties warrants a modular fast-tracking approach. Those word forms in these languages that were recognized by the Zulu analyser were mostly adequately interpreted. Therefore, the focus lies on providing the necessary adaptations based on an analysis of the failure output for each language. As a result, the development of analysers for Xhosa, Swati and Ndebele is considerably faster than the creation of the Zulu prototype. The paper concludes with comments on the feasibility of the experiment, and the results of the evaluation.
منابع مشابه
Experimental Bootstrapping of Morphological Analysers for Nguni Languages
This paper addresses the experimental bootstrapping of the development of broad-coverage finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by using an existing prototype of a morphological analyser for Zulu. These languages are both morphologically complex and resource-scarce. The research question is whether bootstrapping is feasible across the language boundaries be...
متن کاملSemi-automated extraction of morphological grammars for Nguni with special reference to Southern Ndebele
A finite-state morphological grammar for Southern Ndebele, a seriously under-resourced language, has been semi-automatically obtained from a general Nguni morphological analyser, which was bootstrapped from a mature hand-written morphological analyser for Zulu. The results for Southern Ndebele morphological analysis, using the Nguni analyser, are surprisingly good, showing that the Nguni langua...
متن کاملDeveloping Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages
A considerable amount of work has been put into development of stemmers and morphological analysers. The majority of these approaches use hand-crafted suffix-replacement rules but a few try to discover such rules from corpora. While most of the approaches remove or replace suffixes, there are examples of derivational stemmers which are based on prefixes as well. In this paper we present a rule-...
متن کاملPitch modelling for the Nguni languages
Although the complexity of prosody is widely recognised, the lack of widely-accepted descriptive standards for prosodic phenomena has meant that prosodic systems for most of the languages of the world have, at best, been described in impressionistic rule-based terms. For the languages of Southern Africa, the deficiencies in our modelling capabilities are acute. Little work of a quantitative nat...
متن کاملExploiting Cross-Linguistic Similarities in Zulu and Xhosa Computational Morphology
This paper investigates the possibilities that cross-linguistic similarities and dissimilarities between related languages offer in terms of bootstrapping a morphological analyser. In this case an existing Zulu morphological analyser prototype (ZulMorph) serves as basis for a Xhosa analyser. The investigation is structured around the morphotactics and the morphophonological alternations of the ...
متن کامل